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NONQUADRATIC ESTIMATORS OF A QUADRATIC 
FUNCTIONAL 1 

By T. Tony Cai and Mark G. Low 

University of Pennsylvania 

Estimation of a quadratic functional over parameter spaces that 
are not quadratically convex is considered. It is shown, in contrast 
to the theory for quadratically convex parameter spaces, that opti- 
mal quadratic rules are often rate suboptimal. In such cases minimax 
rate optimal procedures are constructed based on local threshold- 
ing. These nonquadratic procedures are sometimes fully efficient even 
when optimal quadratic rules have slow rates of convergence. More- 
over, it is shown that when estimating a quadratic functional non- 
quadratic procedures may exhibit different elbow phenomena than 
quadratic procedures. 

1. Introduction. The Gaussian sequence model 
(1) Y i = 0i + n-V 2 z i , 1 = 1,2,..., 

where Z{ are i.i.d. standard normal random variables, often serves as a gen- 
eral prototypical model in nonparametric function estimation settings. For 
example, it is exactly equivalent to a white noise with drift model and can 
also be used to approximate nonparametric regression and density estima- 
tion models. For the sequence model considerable attention has focused on 
estimating linear and nonlinear functionals of the infinite dimensional mean 
vector 9 = (6 1 ,6 2 ,...). 

One particularly important nonlinear functional is the quadratic func- 
tional Q(9) = J2i^=i @t- Early results on this and related problems were given 
in [3, 11, 15, 17, 18, 19]. More recent results can be found in [13, 22, 23]. 

The problem of estimating this quadratic functional is closely connected to 
the construction of confidence balls in nonparametric function estimation. 
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See, for example, [6, 12, 14, 25]. In addition, as shown in [3, 11, 15] this 
problem connects the nonparametric and semiparametric literatures. 

One of the interesting features of the quadratic functional estimation 
problem is that the usual information bound over any bounded subset of 
I2, as, for example, given in [2], is strictly positive and finite for 9^0. How- 
ever this bound may or may not be useful. Bickel and Ritov [3] and Ritov 
and Bickel [26] showed in the context of i.i.d. data that in some cases the 
information bound is sharp whereas in other cases the information bound 
is not informative because the minimax rate of convergence is slower than 
the usual parametric rate. This phenomenon is often known as the elbow 
phenomenon. 

Donoho and Nussbaum [11] and Fan [15] further developed this theory for 
orthosymmetric quadratically convex parameter spaces such as hyperrectan- 
gles or Sobolev balls. In particular, the minimax theory was fully developed 
in these cases. The elbow phenomenon also occurs in these more general 
settings. Moreover, quadratic rules occupy a particularly important position 
in this theory: simple quadratic rules can always be constructed which are 
minimax rate optimal. 

In this paper we focus on the problem of estimating the quadratic func- 
tional Q{9) over parameter spaces which are not quadratically convex where, 
as we shall show, quadratic rules are no longer sufficient for minimax esti- 
mation. In particular, we explore when the information bound is sharp and 
when nonquadratic rules are needed to attain the bound. An estimator is 
called fully efficient if it attains the information bound asymptotically and 
we say that fully efficient estimation is possible when such an estimator ex- 
ists. We also consider specific examples of parameter spaces which are not 
quadratically convex, namely Besov balls B pq (M) and L p balls L p (a,M) 
with p < 2. These spaces, defined in Section 2, provide a rich collection of 
possible parameter spaces. For these spaces we characterize the elbow phe- 
nomenon for the performance of optimal quadratic procedures and that of 
general minimax procedures. In particular, we show that over these spaces 
when the optimal quadratic procedure does not attain the usual parametric 
rate minimax rate optimal rules must be nonquadratic. 

The paper is organized as follows. In Section 2 we first consider the per- 
formance of quadratic procedures over general orthosymmetric parameter 
spaces. It is known that when the parameter space is quadratically convex 
optimal quadratic procedures are near minimax. Such an analysis has how- 
ever not been given for parameter spaces that are not quadratically convex. 
In fact, as we show, the near minimaxity of optimal quadratic rules typically 
does not hold when the parameter space is not quadratically convex. It is 
shown that the maximum risk of quadratic procedures over any parameter 
space is equal to the maximum risk over the quadratic convex hull. It also fol- 
lows from our results that for Besov balls and L p balls with p < 2 quadratic 
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rules can be minimax rate optimal only if the minimax quadratic risk is 
of order n _1 . For Besov balls and L p balls the minimax quadratic risk also 
exhibits the well-known elbow phenomenon. We show that there is a fully ef- 
ficient quadratic procedure as long as a > | — | whereas if a < - — | optimal 

quadratic rules have maximum risk of order n~ 8s ^ 1+ ' is ^ where s = a + i — -. 

In Section 3 we develop nonquadratic procedures for estimating the quad- 
ratic functional Q{9) over Besov balls and L p balls with p<2. We show that 
optimal nonquadratic procedures exhibit a different elbow phenomenon. A 
local thresholding estimator is constructed and is shown to be fully efficient 
over Besov balls and L p balls with p < 2 and a > Hence when p < 2 

and < a < | — j there are fully efficient nonquadratic estimators while 
all quadratic rules are rate suboptimal. 

Section 3 also considers estimating Q{9) over Besov balls and L p balls with 
p < 2 and a < ^ . In this case it is shown that the minimax rate of conver- 
gence is n~( 2 ~ p /( 1+2ps )) where s = a + ^ — |, and hence optimal quadratic 

rules are once again suboptimal since 2 — > jjq^ . A nonquadratic esti- 
mator is constructed which has risk within a constant factor of the minimax 
risk. 

A distinct feature of the case p < 2 is that the hardest hyperrectangle 
submodel is not as difficult as the full model. In contrast, in the dense case 
of p > 2 hyperrectangle submodels can be chosen which yield not only useful 
lower bounds but also lead to rate optimal quadratic procedures. See [11]. For 
p < 2 the worst case can be captured by a mixture prior supported on a large 
collection of hyperrectangles. Lower bounds are developed in Section 3.3 
based on this mixture prior. Local thresholding procedures which capture 
any large coefficients are shown to be within a constant factor of these lower 
bounds. 

Section 4 briefly considers the adaptation problem for some special cases. 
Attention is focused only on adaptive estimation across a collection of pa- 
rameter spaces over which the minimax rates of convergence are equal. In 
particular, for the collection of all Besov spaces for which fully efficient 
estimation is possible a procedure based on term by term thresholding is 
constructed and is shown to be simultaneously fully efficient over every pa- 
rameter space in this collection. On the other hand, for a fixed nonparametric 
rate of convergence another estimator is constructed which is simultaneously 
rate optimal over all Besov spaces with that given minimax rate of conver- 
gence. The general case of adaptation over parameter spaces with different 
minimax rates of convergence is an interesting but challenging problem. A 
complete treatment is given in [7]. 

Connections between the problems of estimating quadratic functionals 
and a corresponding testing problem is made in Section 5. This testing 
problem was first studied in [20] and Lepski and Spokoiny [24] developed 
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minimax tests for Besov spaces with ^ < p < 2. We show that results devel- 
oped for the estimation problem in Section 3 extend the theory of testing to 
cases not previously considered. 

Section 6 is devoted to a discussion of connections with other related non- 
parametric function estimation problems, namely those of global estimation 
under sum of squared error loss and estimating linear functionals. For ex- 
ample, in global estimation it is known that simple thresholding procedures 
can yield minimax rate optimal procedures over spaces where a few rela- 
tively large coefficients may otherwise lead to a large bias. Proofs are given 
in Section 7. 

2. Performance of quadratic procedures. As mentioned in the Introduc- 
tion, quadratic procedures have received particular attention in the theory of 
estimating quadratic functionals. They have been shown to work well when 
the parameter space is orthosymmetric and quadratically convex. Most com- 
mon parameter spaces, such as Besov balls and L p balls, are orthosymmetric. 
In particular, unconditional bases such as wavelet bases transform common 
function spaces into an orthosymmetric sequence space. See [28]. However, 
many of these spaces are not quadratically convex. In such cases the perfor- 
mance of quadratic rules has not been studied. In this section we study the 
performance of quadratic procedures over general orthosymmetric parame- 
ter spaces. In addition, we consider in detail estimation over Besov balls and 
L p balls with p < 2. 

2.1. General orthosymmetric parameter spaces. Before studying the per- 
formance of quadratic procedures over general orthosymmetric parameter 
spaces it is convenient to introduce some notation. Write Q for the collec- 
tion of all quadratic rules, namely those of the form 



Also write Qd for the subclass of diagonal quadratic rules, namely those 
of the form ^ o-iY- 2 + c. A parameter space O is called orthosymmetric if 
0= (0i,9 2 ,...,6m,...) G @ implies that (±0 1; ±9 2 , ■ . ■ , ±0 m , . ■ .) € 6 for any 
choices of the signs ±. An orthosymmetric set is called quadratically 
convex if the set {(9f) c ^ 1 : 9 G 6} is convex. 

Write the minimax risk for estimating Q{9) = Y] Of as 



and the minimax quadratic risk and minimax diagonal quadratic risk as 



(2) 




(3) 



R*(n,@) =inf sup E B (Q - Q{9)) 



2 



Q 6»ee 



R* Q (n,0) 



inf su\)Eq(Q — Q{9)) 2 and 



(4) 



Rh Q (n,e) 



inf sup E e (Q-Q{9)f. 



QeQ D dee 
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The problem of estimating quadratic functionals has usually assumed that 
the parameter space is both orthosymmetric and quadratically convex. Or- 
thosymmetry allows a minimax analysis of general quadratic rules to focus 
on diagonal quadratic rules. More specifically, for any quadratic rule, say 
Q = J2 a i,jYjYj + c, define Q' = Yl, a i,%^i + c - Fan [15] showed that for any 
orthosymmetric set 

(5) supE e (Q - Q{9)f > supE e (Q' - Q{9)) 2 . 
eee eee 

In particular, it follows that Rq(ti,Q) = R* D q(u, 0) when G is orthosym- 
metric. 

For the analysis of quadratic procedures over general orthosymmetric pa- 
rameter spaces it is convenient and natural to introduce the quadratic convex 
hull. For an orthosymmetric set 0, the quadratic convex hull of is defined 

as 

(6) Q.Hull(0) = {(9^ : (0*)^ G Hull(0^)}, 

where 0^ = {{Of)^ : (0*)^ G G, 9 t > Vi} and Hull(G^) denotes the con- 
vex hull of the set The following theorem characterizes the performance 
of quadratic rules over an orthosymmetric parameter space. 

Theorem 1. Let Q G Qd be a diagonal quadratic estimator of Q(9) = 
Y^9f. Then for any orthosymmetric 0, 

(7) su V E e {Q-Q(9)f= sup E (Q-Q{9)f. 
eee eeQ.Huii(e) 

Consequently the minimax quadratic risk over an orthosymmetric set O 
equals the minimax quadratic risk over the quadratic convex hull ofQ, that 
is, 

(8) R* Q (n; 6) = R* Q (n; Q.Hull(9)) = R* DQ (n; Q.Hull(G)). 

Theorem 1 shows that the performance of the optimal quadratic procedure 
is captured by the minimax quadratic risk over the quadratic convex hull of 
the parameter space 0. If in addition Q.Hull(0) is norm bounded in fa and 
convex it follows from Donoho and Nussbaum [11] that Rn(n; Q.Hull(G)) x 
ir(n;Q.Hull(0)) and hence R* Q (n; 0) x R*(n; Q.Hull(0)). When is not 
quadratically convex Q.Hull(0) is larger than and in some cases, as we 
shall discuss below, R*(n; Q.Hull(0)) 3> R*(n; 0). Consequently the optimal 
quadratic procedure can sometimes have a slower rate of convergence than 
the minimax rate. As we shall show, such is the case for certain Besov balls 
and L„ balls. 
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2.2. Besov balls and L p balls. We now consider as an example Besov 
balls and L p balls. The L p balls are denned by 

(9) L p (a,M) = {9:(j2 iPS \^\ P ) 1/P < m}, 

where s = a + ^ — | > 0. Besov balls in sequence space are typically defined 

in terms of a doubly indexed sequence {Qj.k ■ j = 0, 1, . . . , k = 0, . . . , 2 3 ' — 1}. 
The Besov balls are then defined by 

{/ oo / /2>-i \i/p\<?\i/<? ~> 

where s = a + \ — | > 0. So that we can give a unified treatment of Besov 
balls and L p balls it is convenient for Besov balls to set 6% = 9j t k where 
i = 2 J + k. Noisy observation of Besov coefficients can then still be written 
as in (1). This convention is used throughout the paper. In addition we shall 
assume throughout the paper that p, q,a,s> 0. 

Previous literature has focused primarily on quadratically convex param- 
eter spaces such as Besov balls BgJM) and L p balls L p (a,M) with p>2. 
In particular, Fan [15] gave an analysis for L p balls with p>2 which shows 
that for the parameter space O = L p {a,M) the minimax risk satisfies 

(11) infsup£ e (Q-Q(#)) 2 xn- r ( Q ), 
Q eee 

where r(a) = 1 when a>\ and r(a) = 4 ^" x when a <\. An entirely analo- 
gous analysis yields the same result when = B p q (M) for p > 2. Moreover, 
Fan [15] gave simple quadratic estimators attaining these minimax rates 
of convergence over L p balls. Estimating quadratic functionals over Besov 
spaces was also considered in [23] where the focus was on adaptive estimation 
of more general quadratic functionals using model selection. 

As pointed out in [11] and [15], one important aspect of the quadratically 
convex L p balls is that the difficulty of estimating a quadratic functional is 
then captured by the hardest hyperrectangle subproblem. This reduction is 
instrumental in developing a sharp lower bound as well as in the construction 
of the optimal quadratic rule. 

Our focus is on Besov balls and L p balls with p < 2, in which case the 
parameter spaces are no longer quadratically convex. The standard tech- 
nique of finding the hardest hyperrectangle subproblem is then no longer 
sufficient. In fact, quadratic rules are in general suboptimal and the hardest 
hyperrectangle subproblem need not be as difficult as the full model. Nev- 
ertheless the performance of optimal quadratic rules is easy to characterize 
by the results given in Theorem 1 and an understanding of the quadratic 
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convex hulls of general Besov balls and L p balls. In fact, when p < 2 it is 
easy to check that 

(12) Q.Hull (L p (a, M)) = L 2 (a + i - -, Af) . 

See [10]. Similarly it is easy to check that 

(13) Q.HuU(S«,(M)) = B^ 1/2 ' l/p (M). 

Write r*(0) for the exponent a whenever R*(n,@) x n~ a and similarly 
Tq(0) for the exponent b whenever i2g(n,0) x The following result 
is then a direct consequence of (11)— (13) and Theorem 1. 

Corollary 1. Let < p < 2. JTien r^(L p (a,M)) = r* Q (B% q {M)) = 
min{ 1, ^Jqrj-}, or equivalently, 



(14) r£(L p (a,M)) 



r£(J3£ ff (M)) = l, ^ ena >i_I ; 
^W) = 4^i' whena<- p -\. 



The corollary clearly shows the elbow phenomenon for the minimax quad- 
ratic rate of convergence. There is a break between the usual parametric rate 
of convergence and slower rates of convergence at a = ~ — | . We shall show 
later that the break for the minimax risk for nonquadratic procedures is at 
a smaller value of a. This is illustrated in Figure 1 for the case of p = 1.25. 

When p < 2 and a > - — j it is in fact possible to find a simple procedure 
which is efficient, asymptotically attaining the exact minimax risk. Let m = 
t^— and set 



(15) Qi = E ( Y i 

1=1 



n 



Then simple calculations and lower bounds given in Section 3 yield 

(16) sup£ e (Qi - Q{6)f = R*{n, 0)(1 + o 1 = (1 + (1) , 

eee n 

where = B* q (M) or = L p (a, M) with p < 2 and a>~-\. 

3. Nonquadratic estimators. In this section we focus on the construction 
of a new class of nonquadratic estimators which significantly outperforms 
the optimal quadratic rules for Besov balls and L p balls when p < 2 and 
a < ^ — \- I n this case the minimax quadratic risk converges more slowly 



<s 



T. T. CAI AND M. G. LOW 



than the minimax risk and the result shows that quadratic rules are far from 
optimal. 

We shall consider two separate cases. In the first the nonquadratic es- 
timator is fully efficient over Besov balls and L p balls when p < 2 and 
ij- < a < - — 4, whereas the best quadratic estimator does not even achieve 

the usual parametric rate. In the second case with p < 2 and a < ^ the non- 
quadratic estimator has risk converging faster than the minimax quadratic 
risk. We then derive minimax lower bounds in Section 3.3 which show that 
the risk of this nonquadratic estimator is within a constant factor of the 
lower bound, and hence the estimator is minimax rate optimal. 

3.1. Fully efficient estimation: Besov and L p balls with ^ < a < |. 

In parametric problems, Fisher Information provides a standard benchmark 
for the performance of an estimator. These bounds are often asymptotically 
attainable. The information bound is often useful in semiparametric models 
as well. See, for example, [2]. The problem of estimating a quadratic func- 
tional received attention by Ritov and Bickel [26] as an example where the 
information is strictly positive although it is not always possible to achieve 
the information bound. In the present context of estimating the quadratic 
functional Q(9) the information can easily be calculated to be 1(9) = 4 ^ g2 . 

Standard theory then yields the lower bound 

(17) inf sup E e (Q - Q(9)f > -±-(1 + o(l)), 

Q N s (9) 

where N £ (9) = {(1 - t)9 : < t < e} and < e < 1. It then directly follows 
that (17) provides a lower bound for the minimax risk over a parameter 
space whenever N e (9) C 0. In particular, the information bound given 
in (17) immediately yields 

(18) infsup^(Q-Q(^)) 2 > (l + o(l)) 

Q eee n 

for = Bp q (M) or = L p (a, M). In Section 2 a simple quadratic procedure 
was given which attains the bound given in (18) over Besov and L p balls 
with p < 2 and a > | — | . 

We now consider Besov balls and L p balls where p < 2 and ^ < a < - — ^ . 
Corollary 1 shows that in this case the exponent of the minimax quadratic 
rate of convergence is < 1 • We shall show that in this case fully efficient 
estimation is possible by using nonquadratic rules. One such fully efficient 
rule can be given as follows. 

Let m be a given positive integer. Divide the indices i beyond m into 
blocks of increasing block size so that the jth block is of the size 2 J m. For 
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i in block j, set n = 2j, that is, 



(19) n = 2 



log 2 — 
m 



i> m, 



where \x~\ denotes the smallest integer greater than or equal to x. 

For i > m + 1, set fj, Ui i = Eo{(Y? — ^)+} where the expectation is taken 
under 6 = 0. Let J* be the largest integer such that 2 J *m < n 1 /^ logn 
where once again s = a + ^ — - . Set the estimator of the quadratic functional 

m , is 2 J *m y s % 

(20) £ (tf - i) 

The parameter m serves as a tuning parameter. We shall choose differ- 
ent m for different cases. The nonquadratic estimator Q(m) is built from a 
quadratic part and coordinate- wise thresholding with slowly growing thresh- 
old levels. The thresholding terms are used to guard against individual large 
terms in the tail. 

For the case p < 2 and ^ < a < | — |, set mi = in (20) and define 
the estimator Q2 as 



2 j* 



(21) Q 2 = Q(m 2 )=£(V l 2 4) + £ {( Y *- T i) ~^}- 

i=l i=7Ti2+l 

The following theorem shows that the estimator Q2 is fully efficient. 

Theorem 2. Let < p < 2 and a > Then the estimator Q2 defined 
in (21) is fully efficient over Besov balls and L p balls, that is, it satisfies 

9 4M 2 

(22) su V E e {Q 2 -Q{6)f = l + ol), 

6»gG n 

where 6 = B« q {M) or = L p (a, M) . 

Comparing (22) with (14) shows that in the case ^ < a < - — ^ non- 
quadratic rules can be fully efficient although all quadratic rules are neces- 
sarily rate suboptimal. 

Remark 1. Note that the condition s = a + ^ — | > implies that 
01 ^ 5p whenever < p < 1. Hence fully efficient estimation of Q{9) over 
Besov balls and L p balls is always possible when < p < 1. For L p balls this 
has already been noted in [23]. 
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Remark 2. Although the primary focus in the construction of Q2 is on 
the case ^- < a < - — |, the estimator Q2 is also fully efficient when a > 

- — 4. The quadratic part of Q2 equals Q\ given in (15). The contribution 

of the thresholding part of Q2 is negligible in the case a > | — \ . 

3.2. Besov balls and L p balls with a < So far we have focused on 
parameter spaces where fully efficient estimation is possible. We now turn 
to both Besov balls and L p balls with a < ^- and construct a nonquadratic 
estimator which has a much faster rate of convergence than the minimax 
quadratic rate given in Section 2. This result again shows that quadratic 
rules are rate suboptimal and there is much to be gained by using non- 
quadratic rules. 

Let 171,3 = n p ( 1+2ps ) and set the estimator Q3 of the quadratic functional 
Q = Ei^i 0? as Q(m) in (20) with m = m 3 . That is, 

(23) q 3 =q(-3)=£(V4)+ E {( Y '-i) -f^ib 

i=l i=mz+l + 

where once again J* is the largest integer such that 2 J *m < n 1 /^) logn. The 
following provides an upper bound for the risk of the estimator Q3. 

Theorem 3. Let < p < 2 and a < ^. The estimator Q s given in (23) 
satisfies 

(24) sup E e (Q 3 - Q{6)f < C n^ 2 ^^ 1+2 ^\l + o(l)), 
see 

where C > is a constant and Q = Bp JM) or Q = L p {a,M). 

It is easy to check that if p < 2 then 2 — > ^^j- . Hence quadratic 

rules are necessarily rate suboptimal when p < 2 and a < In the next 
section it is shown that no estimator has maximum risk converging faster 
than n~( 2 ~ p /( 1+2ps )) and thus the estimator Q3 is minimax rate optimal. 

The analysis of the estimators Q2 and Q3 relies on a detailed analysis 
of bias and variance of thresholding estimators for each coordinate. The 
following lemma may also be of independent interest. 

Lemma 1. Let X ~ N(6, ±) andr>\. Set fi = E {(X 2 - ^) + } where 
the expectation is taken under 6 = 0. Let Q = (X 2 — ^) + — /jlq. Then 

4 



(25) |/xo| < 



(26) \E e Q-e z \<mm 



2 7 rnr 1 /2 e r/2 ' 

2r 

— ,t 

n 
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and the variance of Q satisfies 



(27) 



Var(Q) < — + 




Combining the results given in Section 2 as well as this section for both 
quadratic and nonquadratic rules, we can compare the optimal rates of con- 
vergence over Besov and L p balls. Figure 1 gives a comparison for the case 
of p = 1.25 as a function of a. It illustrates the different elbow phenom- 
ena for the minimax rate of convergence and the minimax quadratic rate of 
convergence. 

3.3. Minimax lower bounds. As shown earlier, the information bound 
given in (18) is sharp for estimating Q(9) over Besov balls and L p balls 
when < p < 2 and a > although sometimes nonquadratic rules are 
needed to attain the bound. When < p < 2 and a < ^ the information 
bound is no longer attainable. In this section we provide an improved lower 
bound which shows that the minimax rate of convergence is slower than 
the usual parametric rate. Furthermore these lower bounds show that the 
nonquadratic estimator Q% given in (23) is minimax rate optimal. 

The derivation of the lower bound given in this section differs from the 
standard technique of inscribing a hardest hyperrectangle and using the 
Bayes risk for a prior supported on the hyperrectangle as a lower bound to 
the minimax risk. The hardest hyperrectangle techniques works when the 
parameter space is quadratically convex. See, for example, [11] and [15]. 
However, this technique does not work in our context where the hardest 
hyperrectangle submodel is not as difficult as the full model. The lower 
bound given below is based on a mixture prior which mixes over a rich 
collection of hyperrectangles. The mixing increases the difficulty of the Bayes 
estimation problem and results in a sharper lower bound. 

Theorem 4. The minimax risks for estimating the quadratic functional 
Q(9) = 2~2Gf over the Besov balls B pq (M) and L p balls L p (a,M) satisfy, for 
some constant C > 0, 



mfsupE e (5-Q(6)) 2 



s see 



(28) 




when < p < 2 and a> — 

2p 



when < p < 2 and a < — 

2p 



where G = B p q (M) or 9 = L p (a, M). 




alpha 

Fig. 1. Comparison of exponents in the minimax rate of convergence and minimax 
quadratic rate of convergence for p — 1.25. 



The lower bounds show that when p < 2 and a < ^ the optimal rate is 
slower than the parametric rate. A comparison of the lower bound given 
above with the upper bound given in (24) shows that in this case the mini- 
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Table 1 

Comparison of minimax and minimax quadratic rates of 
convergence 





< p < 2 




p 


> 2 


a < ±- 

— 2p 




p 4 


«< | 


a > | 


r * 8s 

Q 1 + 4,3 

* 4ps+2 — p 


8s 
l+4s 

1 


l 

l 


8a 
l+4a 
8a 


1 
1 


l+2ps 


l + 4a 



max rate of convergence is n~^ 2 ~ p ^ 1+2ps ^ and the nonquadratic procedure 
Qz is minimax rate optimal. 

The results given in Section 2 and this section can be summarized in 
Table 1 above. For comparison and completeness we also include the well- 
known results for p > 2 in the table. As in Section 2 define r* and rg to be 
the exponents of the minimax and minimax quadratic rate of convergence, 
respectively. We can compare the values of r* and Tq for all cases in Table 1 , 
where as usual we assume s = a + i — - > 0. 

A p 

4. Simple adaptation. The main focus of the paper is on deficiencies of 
quadratic estimators and on the minimax performance of the nonquadratic 
estimators. The estimators Q2 an d Q3 depend on the parameters a and p 
and are thus not adaptive. A modification of the estimator Q2 can achieve 
full adaptation over collections of Besov balls and L p balls when fully efficient 
estimation is possible. 

Set m = and let 7 > 1 be a constant. Let J* be the largest integer 

such that 2 J *m < n 7 logn. Set 

m , -. s 2 J m , , s ^ 

m ^(tf-shEjOf-SX-**}, 

t = l ' 1=771+1 ' 

where /i n> i is defined the same as in Q2. It is then not difficult to show that 
for all < p < 2 and a > ± + (i - \ + i) + 

9 4M 2 

(30) supE e (Q 4 -Q(e)) 2 = l + o 1)), 

6»ee n 

where G = B% q {M) or Q = L p (a, M). 

Hence, the estimator Q4 is adaptively fully efficient over the collection 

{ BpV M):0< P <2,a>i- + (i--i + i-) }. 
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More interestingly, if we take J* = oo and set 

1=1 i=m+l ' 

then the estimator is adaptively fully efficient over all Besov balls and 
L p balls with p < 2 and a > In fact, Q5 is also adaptively fully efficient 
over Besov balls and L p balls with p>2 and a > \ . It is easy to see from 
Table 1 that these are the maximum collections of Besov balls and L p balls 
over which fully efficient estimation is possible. Therefore the estimator Q5 
is adaptively fully efficient whenever fully efficient estimation is possible. We 
summarize the results in the following theorem. 

Theorem 5. The estimator Q5 defined in (31) satisfies 

9 4M 2 

(32) supE e (Q 5 -Q(e)) 2 = l + o 1)), 

eee n 

where = B^ q (M) or = L p (a, M), with < p < 2 and a > i or p > 2 
and a > \ . 

Similarly we can also consider the case where the minimax rate of con- 
vergence is nonparametric. Fix a constant < r < 1 and let 

O(r) = |(a,p):0<p<2,0<a<^, r ^ = 2-r|, 
where, as usual, s = a + i — - > 0. Note that the minimax rate of convergence 

z p 

for estimating the quadratic functional Q(0) =2~2&i over an Y Besov or L p 
ball with parameters (a,p) E Q(r) is n~ r . 

Let m = n 2 ~ r and let Ti be defined as in (19). Set 

(33) * = £(tf-i) + f 

1=1 i=m+l ' 

Then it is easy to show that the estimator Qq adaptively attains the minimax 
rate of convergence n~ r over each Besov ball B p q (M) or L p ball L p (a,M) 
with (a,p) G Q(r). 

The discussion given above is restricted to cases where the minimax rates 
of convergence over all the parameter spaces in the collection are the same. A 
more general approach should consider adaptation over spaces with different 
minimax rates of convergence. The more standard case where p > 2 has been 
considered by Klemela [22] . The general case is an interesting but challenging 
problem. A complete treatment is given in [7]. 
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5. Connection between estimation and testing. As is common in statisti- 
cal inference there are strong connections among the problems of estimation 
and testing quadratic functionals. In this context the testing problem which 
has received most attention is that of testing the null hypothesis 

H : = 

against the alternative 

The difficulty of this testing problem depends on assumptions imposed on 
the unknown 0. Particularly important early work on this problem can be 
traced to Ingster [20]. 

There are two major related goals in these problems. One goal is to find 
the test which, given a particular choice of a n , minimizes the sum of the 
type I and maximum type II errors. Alternatively we may fix the maximal 
sum of the type I and type II errors and try to find the smallest possible 
a n compatible with this constraint along with the corresponding test. More 
specifically, for a given < 7 < 1 let a n (j) be the smallest choice of a n for 
which there is a test with type I plus maximal type II error less than or 
equal to 7. 

The solution to this testing problem always yields lower bounds to the 
corresponding estimation problem as follows. First note that every estimator 
Q of ^2 Of gives rise to a test of this hypothesis in the following way. If 

Q < , then the null hypothesis Hq is accepted and if Q > the null 
is rejected. It then immediately follows that 

(34) su P £(Q - 5> 2 ) 2 > = paid). 

It is then easy to connect an asymptotic statement about the testing 
problem into asymptotic lower bounds for the estimation problem. For ex- 
ample, if r% is the optimal minimax rate for the testing problem, namely 
a n(l) ~ n~ rt , it then follows that 

(35) inf sup e(q - Y. 6 ^ > Cn" 2rt (l + o(l)). 

Hence knowledge about the optimal rate in testing immediately yields 
a lower bound for the optimal rate in the estimating problem. Likewise 
upper bounds on the estimation problem yield upper bounds on the testing 
problem. For example, if 

sup E(Q-Y,0f) 2 ~n- r *, 

then 

a„( 7 )<Cn-^ 2 (l + o(l)). 
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This testing problem has been considered in [24] over Besov balls BL* „ 
for the cases where — < p < 2. Although the testing theory has not been 
developed for the cases where a < ^, the present paper does immediately 
yield the optimal rates for testing whenever a < • In this case the optimal 

rate for the testing is n~^ 1 ~ p ^ 2 ^ 1+2ps ^ and this rate has the same functional 
form as the minimax rate given in [24] in the case a > ~. For the range 

it- < a < - the estimation problem appears to be "harder" than the testing 

zp p 

problem and our results on estimating the quadratic functional do not yield 
sharp lower bounds or upper bounds on testing. 

6. Discussion. There are strong similarities between estimating a quad- 
ratic functional over a parameter space that is not quadratically convex 
and that of estimating linear functionals over a nonconvex parameter space. 
For estimating the quadratic functional it is shown in Section 2 that the 
maximum risk of quadratic procedures over a parameter space is equal to 
the maximum risk over the quadratic convex hull of the parameter space. 
On the other hand it was shown in [5] that for estimating linear functionals 
the maximum risk of linear procedures over a parameter space is equal to 
the maximum risk over the convex hull of the parameter space. 

There is also some similarity between the work on estimating a quadratic 
functional and that of estimating all the coefficients under sum of squared 
error loss. In both problems extra care must be taken for parameter spaces 
where a few large coefficients can degrade the performance of naive esti- 
mators. Under sum of squared error loss the naive estimators correspond 
to linear estimators. Such estimators can perform well for p > 2. In par- 
ticular there are simple linear procedures which are minimax rate optimal. 
On the other hand, if p < 2 then minimax rate optimal procedures must be 
nonlinear. The case where p > 2 is sometimes referred to as the dense case 
since in this situation the difficulty of the problem is caused by situations 
where there are many small coefficients. The case where p < 2 corresponds 
to sparse situations where there may be a "few" large coefficients which 
if not estimated well inflate the risk. For global estimation under sum of 
squared error loss Donoho and Johnstone [8] have shown that fairly simple 
term-by-term thresholding rules can then yield minimax rate optimal pro- 
cedures. In density estimation problems Donoho, Johnstone, Kerkyacharian 
and Picard [9] showed a similar phenomenon exists. 

For estimating quadratic functionals, the naive estimators are quadratic 
rather than linear. Similar to the problem of global estimation the case 
p > 2 is easiest. Minimax rate optimal quadratic estimators always exist. 
However, for estimating a quadratic functional quadratic rules can some- 
times be asymptotically fully efficient even in cases when p < 2. Such is the 
case when p < 2 and a > - — j . When p < 2 and a < - — j, quadratic rules 
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are rate suboptimal while in some cases nonquadratic rules can be fully ef- 
ficient. As in the case of global estimation term- by-term thresholding can 
yield minimax rate optimal procedures. 

In global estimation minimax rate optimal procedures can be based either 
on soft thresholding or hard thresholding. The same holds when estimating 
the quadratic functional. The form of this thresholding is not important. 
There is an analog of Lemma 1 for hard thresholding and so minimax rate 
optimal procedures can be based on hard thresholding. More specifically, let 
the estimator Q(m) be defined as 

m / 1 \ 2 J *m , , s ^ 

(36) qm=E(V4) + E \ Y H Y "> T i)-p^ 

i=l i=m+l 

where Tj is given as in (19) and p n ^ = E {Y 2 I(Y 2 > a )} with the expecta- 
tion taken under 9 = 0. Then the results of Theorems 2, 3 and 5 hold for 
Q(m) with the same choices of m. 

Finally we should also note that the term-by-term thresholding proce- 
dures used here are quite different from the global thresholding rules used 
in [21] and [27], which are designed for estimation over quadratically convex 
parameter spaces where the worst case is always given by a large number 
of small coefficients but where the exact locations of these coefficients are 
unknown. 



7. Proofs. For proofs involving both Besov balls and L p balls we shall 
only give details for the L p balls since the proofs for Besov balls are entirely 
analogous. The proofs of Theorems 2, 3 and 5 all rely on the technical result 
given in Lemma 1 and are similar. We shall present a detailed proof for 
Theorem 3, a brief proof for Theorem 2 and omit the proof for Theorem 5. 
In the proofs we shall denote by C a positive constant not depending on n 
that may vary from place to place. 



Proof of Theorem 1. Since 6 C Q.Hull(G), it is obvious that 

sup E e (Q-Q{9)f< sup E e {Q-Q{8)f. 
6»ee eeQ.Huii(e) 



Let Q be a diagonal quadratic estimator of Q(0). Write Q = Yli a iH + b. 
Let 9 e Q.Hull(G) and 6 2 = Y Jj \j(6&) 2 with 0&') G 9, Xj > and £j Xj = 1. 
Write 

V(9)=V&r (Q) and B(9) = E e Q - Q(9). 

Then 



+ 



n n 2 
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and 



j i i 

= E^{e^? ) ) 2 +E^+^-E(^ ) ) 2 } 

j \ i i i ) 

Let max,-{V(0^°)) + £? 2 (0W)} = £). Then the Cauchy-Schwarz inequality 
yields 

MQ - Q{0)f = v(e) + b\o) = J2 \jV{eV)) + ( ]T 



<0-E^0 w ) + (EVW w )l 



2 



< D 

< sup^(Q-Q(0')) 2 - 
8'ee 

Hence for any diagonal quadratic estimator Q 

(37) sup E e {Q - Q{9)f = sup E e (Q-Q(0)) 2 . 

9eQ.HuU(e) eee 

Since G is orthosymmetric it follows from [15] that minimax quadratic pro- 
cedures are found within the class of diagonal quadratic procedures. So it 
follows from (37) that for any quadratic estimator Q 

sup E e (Q-Q(6)f = sup E e (Q-Q(6)f, 
eeQ.Huii(e) fee 

and hence (8) holds. □ 

Proof of Lemma 1. Denote by 4>(z) and $(z) the density and cu- 
mulative distribution function of Z, respectively, and set $(z) = 1 — $(z). 
It then follows from the alternating series bound for Gaussian tails &(z) > 
{\ - \)4>{z) for z > that 



2 [°° , o , 1 



Mo = — 



n J T i/2 ^/2tt 



NONQUADRATIC ESTIMATORS OF A QUADRATIC FUNCTIONAL 19 

= 2Tl/2 - fci>*( T V*) 
V / 2^ne r / 2 n 

(38) 

2t 1 ' 2 2(t-1) ( 1 1 



27rne^/ 2 V^ne^/ 2 \t x I 2 t*I 2 
4 



Set £(0) = EgQ - 2 = E e {X 2 - £) + - ^ - 2 . It is easy to check that 

(39) e 2 --<e(x 2 <0 2 + -. 

n \ n J + n 

Hence 

(40) |5(0)|<- + ^o< — • 

n n 



B '<e) = A[^( r l/2 _ n l/2m _ 0( r l/2 + „1/2M1 



Straightforward calculation yields for > 
(41) 

- 26MT 1 / 2 - n^O) - ^(-r 1 / 2 - n l / 2 6)) 

and 

= 2r 1 / 2 [0( r V2 _ „i/2 0) + 0(r l/2 + n i/2 0)] 

- 2[<D(r 1 / 2 - n 1 / 2 ^ - <D(- r V 2 _ n i/ 2 #)]. 

It suffices to only consider > since -6(0) = B(—9). It follows immediately 
from (41) that for all 9 > 0, B'(0) > -2(9 and hence 

(43) B(6) > -9 2 . 

On the other hand, for < 9 < equation (42) yields 

(44) B"(0) < sup{2r 1 / 2 [ ( /,(r 1 / 2 - 1) + </>(r 1/2 )]} < 2. 

r>l 

Note that JB'(O) = and hence for < 9 < 

(45) B'(0) < 20. 
For > 4j it follows from (41) that 

(46) B'(0) < -= < 20 
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and it follows from B(0) = that for all 9, B(9) < 9 2 . Hence for all 

(47) \B{9)\ <9 2 

and (26) now follows from (40) and (47). The proof will be complete once 
we establish (27). First we state and prove the following simple lemma. 

Lemma 2. For any two random variables X and Y , 

(48) Var (max{ V, Y }) < Var X + Var Y. 
In particular, for any random variable X , 

(49) Var((X)+) <VarX 

Proof. Without loss of generality we can assume fj,x = and fj,y > 0. 
Let Z = m&x{X, Y}. Then 

(50) EZ 2 < EX 2 + EY 2 
and 

(51) EZ> fly- 
Hence 

(52) Var Z = EZ 2 - (EZ) 2 < EX 2 + EY 2 - & = Var X + VarF. □ 

We now turn to the proof of (27). When 6 2 > ^ it follows from Lemma 
2 that 

, ox 4# 2 2 60 2 

Var (Q) < Var X 2 = + < 

n n z n 

and so (27) holds. 

Now consider 6 2 < -. Because of the symmetry, it suffices to consider 
< 9 < -7= . Note that direct calculations show for < 9 < -j= 

2 i— P / \ 2 



Var(Q) < E { X 2 - - = \ x 2 - - e -/ 2 (-*) 2 



1 



27m 2 7(2+n 1 /26i)2>. 

1/2 



((^ + n 1 / 2 0) 2 -r) 2 e- z2 / 2 d0 



< -jL- f ((z + n 1 l 2 9) 2 - r) V* 2 / 2 dz 



+ — L- /°° {{ z + l) 2 - T ) 2 e- z2 l 2 dz 
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1 CT 1 / 2 

<-=L-\ ze- z2 ' 2 dz- sup z-\{z + n x l 2 0f- 

\/2irn 2 JT 1 / 2 -n 1 / 2 ze[T 1 / 2 -n 1 / 2 0,T 1 / 2 } 

2 f°° 

+ r= o / l z * + 4 ^ + (" 2r + 6 )^ + (~ 4r + 4 ) z + ( T - X ) 2 ] 
V27m 2 7 r V2 



= Fi + V 2 
Note that 



and 



CT 1 CI 1 

l ze-* 2 ' 2 dz< ze~ z2 ' 2 dz = e - W a -U a _ e ^ 2 

J T l/2_ ri ,l/2g Jt 1 /*-! 



sup ^i(( z + n i/2 0) 2_ r) 2 = gup x(x + 2r 1 /2)^ 



ze[r i/2_ ra i/2 0ir i/2] x6[0,nVatf] 1 + KT 1 ~ n v l l B)x- 1 

= ne 2 T- l l 2 (2r l l 2 +n l l 2 e) 2 
<n6 2 T- l l 2 (2T l l 2 + l) 2 . 

Hence 

9 2 



n - 

n 



— ■ 

36 2 



S up|-l=x- 1 (2x+l) 2 [e- 1 /2(-i) 2 _ e - 2 /2 ] | 



< 
n 



We now turn to the term V2 . Note that for any p > 

/ zPe- z2 / 2 dz = aP- 1 e- a2 l 2 + {p-l) z^e^ 2 / 2 dz, 

J a J a 

and in particular for any a > 

[°° e- z2 ' 2 dz<a- l e- a2 l 2 . 

J a 

Hence, after some algebra, we have 

v < lOr 1 / 2 + 20T" 1 / 2 + 24 < 4t 1 /2 + 18 



^ n 2 e r/2 - n 2 e r/2 



Hence, for < < 4= 



„ ,a w 30 2 4^/2 + 18 
Var(Q) < — + 

and consequently, for all 0, Var(Q) < ^ + ± $Jt^- □ 
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Proof of Theorem 3. Set m = m 3 = n P/(i+ 2 P s ), Note that for X ~ 
N(fj,,a 2 ), Var(X 2 ) = 4// 2 <r 2 + 2cr 4 . Note also that r, = 2j for all coordinates 
in the j'th block beyond the initial m terms. It then follows from Lemma 1 
that 

Va I (4)<5j + iS^ + 6E ™« 9 ' 



n n 



(53) +E2 ,-, m .lM^ 

3=1 

< Cn -(2-p/(l+2 P ,)) (1+o(1)) 

for some constant C > 0, where the last step follows from the fact that for 
any b > 

oo 

(54) ^//2 e -6i <00 

3=1 

For the bias note that equation (26) in Lemma 1 yields 

2 J *m / r) \ oo 

(55) |Bias(Q 3 )|< E min (v'^) + ^ ^ 

1=771+1 i=2 J *m+l 

The second term in (55) is easy to bound. Note that for any j > 0, the L p 
ball constraint (9) yields for p < 2 

/ 2J +1 m \ I/ 2 / 2J +1 m \ Vp 

(56) Y, 9 n < E i^r < M2 " JS ™ _S - 

Vj=2Jm+l / Vj=2Jm+l / 

Hence for all G L p (a,M), 

oo oo 

E *i < E M2- 2 ^m- 2s < Cn-Va 

i=2 J *m+l j=J* 

for some constant C > 0. 

It remains to bound the first term in (55). Note that it is straightforward 
to verify that for all 9 £ L p (a, M) and all j > 1, 

(57) E \0i\ p <M p 2 ps 2~ jps m- ps . 

i=2i- 1 m+l 

Hence 

2 J *m /9 x J, 2 j m , , ■ 

i=m+l x 7 j=li=2J- 1 m+l 
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j=l i=2J- 1 m+l x J 



7=1 i=2J- 1 m+l v J ' ' 



where the last step follows from the facts min(l, 0? • ^) < 1 and | < 1. 
Hence, 



i=m+l 7 j=l x ' i=2i~ 1 m+l 



(58) < | M p 2 ps+2 ~ p £ ji-p/^-ii" | • m - ps ?i 



-PS p/2-l 



for some constant C > 0, since J2 < jLi3 1 ~ p ^ 2 2~ : ' ps < °°- Hence, with m = 

n p/(l+2ps) 

2 J *m /9 \ 

(59) min (v'N <^- (2 " p/(1+2ps,)/2 . 

i=m+l 

Hence for p < 2 and a < ^ 

(60) Bias 2 (Q 3 ) < Cn"( 2 - p /( 1+2ps ))(l + o(l)). 
Equations (53) and (60) together yield 

^(Qs - Q(0)f < Bias 2 (Q 3 ) + Var(Q 3 ) < C n -( 2 - p /( 1+2 ^(l + o(l)). □ 

Proof of Theorem 2. The proof of Theorem 2 is analogous to that of 
Theorem 3 and we only give a brief outline here. Set m = mi = j^g^- Then 

Var «j 2 ) < 5j + ISifi + + ^ . ^ + i8 

<61> 4M ,„ 
< l + ol . 

n 

The maximum squared bias of Q2 is negligible relative to the minimax risk. 
This can be shown as follows. Same as in (55) we have 

2 J *-m / r) \ 00 

(62) |Bias(Q 2 )|< £ min (A^J + £ < 

i=m+l V / i=2 J *m+l 



3 2 
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With m = and ap > |, equation (58) yields that for some constant 
C>0 



2 J *m 



2r i n2\ / /-.„- M „d/2-1 ^ lo g™ 



(63) J] min ( 0? ) < Cm^ 8 ^ 2 ^ 1 = C 

i=m+l 



n J \ n 



up 



For the tail term it follows from (56) that 

oo oo 

£ ^ 2 < E M2-^' s m- 2s 

i=2 J *m+l j=J* 

(64) 

= M(l - 2" 2s )- 1 (2 J *mr 2s < Cn^ 1 / 2 (logn)^ 2s 

for some constant C > 0. Equations (63) and (64) yield Bias 2 (Q2) — o(-) 
and Theorem 2 follows. □ 



Proof of Theorem 4. We shall only consider the case < p < 2 and 
a < 2p smce the information bound given in (17) can otherwise be applied. 
The main idea is to inscribe a collection of hyperrectangles inside the pa- 
rameter space. A prior then mixes over the vertices of the hyperrectangles 
in this collection and a lower bound for the corresponding Bayes risk and 
hence minimax risk is given. 

Let @k, m be the union of the zero vector 9q = (0, 0, . . . ) and the collection 
of vectors which have exactly k nonzero coordinates equal to in the 
first m coordinates and are otherwise equal to zero. We shall write Q m for 
0r m i/2] m - Now suppose that Q is an estimator which satisfies 

(65) Ee (Q-Q(e )f<c^ 

for some constant < c < ^e 1_e . We shall now show that in this case 

(66) sup Ee(Q - Q(0)f > ( \ ~ 2e^/ 2 M ™ , 
eee m \4 J n l 

and hence for some constant C > 0, 

(67) inf sup E e {Q - Q(0)f > C^. 

Q 6ee m n 

Let ip^ be the density of a univariate normal distribution with mean \i 
and variance ^. Let T(k,m) be the class of all subsets of {l,...,m} of k 
elements and for I 6 1(k, m) let 

m 

gi(yi,---,ym) = n^jiyj), 
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where fij = G J). Finally let 

9 = "py E 91 

and / be the joint density of m independent normal random variables each 
with mean and variance -. Note that a similar mixture prior was used 
in [1] to give lower bounds in a nonparametric testing problem. Now note 
that if 



i-fc- ] 

n, 



for all I £ I(k, m), then it follows that 



5-fc- ] 



We will now apply the constrained risk inequality of Brown and Low [4]. 
First we need to calculate a chi-squared distance between / and g. This is 
done as follows. Note that 

9 2 1 f 9i9i' 



E E 



f pn 2 ^ J f 

J \k) IeX(k,m)I'&I(k,m) J 

and simple calculations show that 

f ' 

where j is the number of points in the set I Pi I' . It follows that 
where J has the hypergeometric distribution 



P{J = j) 
Now note that from [16], page 59, 



fk\ fm—k\ 



, 3 ) \m / \ m j \ m 
Now let k = [m 1 / 2 ] . Then for m > 4, 

k\~ k 
1 ) < 4 

m 
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and hence 



3 J \mj V 771 



,e-l 



Consequently 

fj = Ee J <4{l + (e-l)^j k <^ 

It now follows from the constrained risk inequality in [4] that if 
(68) Ef {Q-Q{6 Q )f<c^ 
then 

E q (Q-^ 2 >^-4^y^ ml/2 



n I n n 



(69) >(i_ 8e ( e -D/V/ 2 )^ 

> fI_ 2e (e-D/2 c l/2\^ 

\4 / n 2 

Hence (67) holds. 

It is now easy to check that m as defined above is contained in the L p 
ball L p (a,M) when m = Cn v ^ l+2ps ' > for sufficiently small constant C > 0. 
Hence it directly follows from (67) that 

inf sup E e {5 - Q{9)f > inf sup E e (5 - Q(9)f 

6 8eL p (a,M) s 6»£0 m 

(70) 
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